perm filename CHAP11.TEX[WEB,ALS] blob sn#690217 filedate 1982-12-15 generic text, type T, neo UTF8
\chapterbegin Chapter 11. Boxes

\TeX\ makes complicated pages by starting with simple individual characters
and putting them together in larger units, and putting these together in still
larger units, and so on. Conceptually, it's a big paste-up job. The \TeX nical
terms used to describe such page construction are {\sl ↑{boxes}\/} and
{\sl ↑{glue}}.

Boxes in \TeX\ are two-dimensional things with a rectangular shape, having
three associated measurements called {\sl↑{height}}, {\sl↑{width}}, and
{\sl↑{depth}}.  Here is a picture of a typical box, showing its so-called
↑{reference point} and ↑{baseline}:
$$\eightpoint
\setbox0=\hbox{$\uparrow$}
\setbox1=\hbox to 1wd0{$\hss\relv\hss$} % with luck, they'll line up
\setbox2=\vbox{\copy0
  \nointerlineskip \kern-.5pt \copy1
  \nointerlineskip \kern-.5pt \copy1
  \moveleft 1em\hbox{height}
  \copy1 \nointerlineskip \kern-.5pt
  \copy1 \nointerlineskip \kern-.5pt
  \hbox{$\downarrow$}
  \kern.2pt}
\setbox3=\vbox{\kern.2pt\copy0
  \moveleft 1em\hbox{depth}
  \hbox{$\downarrow$}
  \kern0pt}
\setbox4=\vtop{\kern-3pt % this cancels the null text above the samplebox
  \hbox{\samplebox{1ht2}{1ht3}{6em}{}%
    \kern-6em
    \raise3pt\hbox to 6em{\hss Baseline\hss}}
  \kern3pt
  \arrows{6em}{width}}
\dbox{\setbox0=\hbox{$\vcenter{}$}% 1ht0 is the axis height
  \lower1ht0\hbox{Reference point$-$\kern-.2em$\rightarrow$\kern2pt}%
  \raise1ht2\box4
  \kern1.5em
  \raise1ht2\vtop{\kern0pt\box2\nointerlineskip\box3}\hss}$$
From \TeX's viewpoint, a single character from a font is a box; it's one
of the simplest kinds of boxes. The font designer has decided what the
height, width, and depth of the character are, and what the symbol will
look like when it is in the box; \TeX\ uses these dimensions to paste
boxes together, and ultimately to determine the locations of the reference
points for all characters on a page.  In plain \TeX's |\rm| font (cmr10), for
example, the letter `h' has a height of 6.9444 points, a width of 5.5556
points, and a depth of zero; the letter `g' has a height of 4.3055
points, a width of 5 points, and a depth of 1.9444 points. Only certain
special characters like parentheses have height plus depth actually equal
to 10 points, although ↑{cmr10} is said to be a ``10-point'' font. You needn't
bother to learn these measurements yourself, but it's good to be aware of
the fact that \TeX\ deals with such information; then you can better
understand what the computer does to your manuscript.

The character shape need not fit inside the boundaries of its box. For example,
some characters that are used to build up larger math symbols like matrix
brackets intentionally protrude a little bit, so that they overlap
properly with the rest of the symbol. Slanted letters frequently extend a
little to the right of the box, as if the box were skewed right at the top
and left at the bottom, keeping its baseline fixed. For example, compare
the letter `g' in the cmr10 and ↑{cms10} fonts (|\rm| and |\sl|):
\figure{40pt}{(A figure will be inserted here; too bad you can't see it now.
  It shows two g's, as claimed.)}
In both cases \TeX\ thinks that the box is 5 points wide, so both letters get
exactly the same treatment. \TeX\ doesn't have any idea where the ink will
go---only the output device knows this. But the slanted letters will be
spaced properly in spite of \TeX's lack of knowledge, because the baselines
will match up.

Actually the font designer also tells \TeX\ one other thing, the so-called
{\sl↑{italic correction}\/}: A number is specified for each character,
telling roughly how far that character extends to the right of its box
boundary, plus a little to spare. For example,
the italic correction for `g' in cmr10 is $0.1389\pt$, while in cms10 it is
$0.8565\pt$. Chapter@4 points out that this correction is added to the
normal width if you type `↑{*/}' just after the character. You should remember
to use |\/| when shifting from a slanted font to an unslanted one, especially
in cases like
\ttbegin
the so-called {\sl italic correction\/}:
\ttend
since no space intervenes here to compensate for the loss of slant.

\smallbreak
\TeX\ also deals with another simple kind of box, which might be called
a@``↑{black box},'' namely, a rectangle like
`\thinspace \vrule width 4pt height 6pt depth 1.5pt \thinspace'
that is to be entirely filled with ink at printing time. You can specify any
height, width, and depth you like for such boxes---but they had better not have
too much area, or the printer might get upset. \ (Printers generally
prefer white space to black space.)

Usually these black boxes are made very skinny, so that they appear as
horizontal lines or vertical lines. Printers traditionally call such lines
``↑{horizontal rules}'' and ``↑{vertical rules},'' so the terms \TeX\ uses
to stand for black boxes are ↑{*hrule} and ↑{*vrule}. Even when the box is
square, as in `\thinspace\bull\thinspace', you must call it either an@|\hrule|
or a@|\vrule|.  We will discuss the use of rule boxes in greater detail later.

\smallbreak
Everything on a page that has been typeset by \TeX\ is made up of simple
character boxes or rule boxes, pasted together in combination. \TeX\
pastes boxes together in two ways, either {\sl horizontally\/} or {\sl
vertically}.  When \TeX\ builds a ↑{horizontal list} of boxes, it lines
them up so that their reference points appear in the same horizontal row;
therefore the baselines of adjacent characters will match up as they
should. Similarly, when \TeX\ builds a ↑{vertical list} of boxes, it lines
them up so that their reference points appear in the same vertical column.

% Here are some macros for making blank boxes
\def\dolist#1{\def\next{#1}
  \ifx\next\endlist \let\next\relax
  \else \\ \let\next\dolist \fi
  \next}
\def\\{\ifx\next\space\ \else \setbox0=\hbox{\next}\maketypebox\fi}
\def\demobox#1{\setbox0=\hbox{\dolist#1\endlist}%
  \copy0\kern-1wd0\makelightbox}

Let's take a look at what \TeX\ does behind the scenes, by comparing the
computer's methods with what you would do if you were setting metal type
by hand. In the time-tested traditional method, you choose the letters that
you@need out of a type case---the upper-case letters are in the ↑{upper
case}---and you put them into a ``↑{composing stick}.'' When a line is
complete, you adjust the spacing and transfer the result to the ``chase,''
where it joins the other rows of type. Eventually you lock the type up
tightly by adjusting external wedges called ``quoins.'' This isn't much
different from what \TeX\ does, except that different words are used; when
\TeX\ locks up a line, it creates what is called an ``↑{hbox}''
(↑{horizontal box}), because the components of the line are pieced
together horizontally. You can give an instruction like
\ttbegin
\hbox{A line of type.}
\ttend
in a \TeX\ manuscript; this tells the computer to take boxes for the appropriate
letters in the current font and to lock them up in an hbox. As far as \TeX\ is
concerned, the letter `A' is a box
`\thinspace\setbox0\hbox{A}\maketypebox\thinspace'
and the letter `p' is a box `\thinspace\setbox0\hbox{p}\maketypebox\thinspace'.
So the given instruction causes \TeX\ to form the hbox
$$\displaybox{\demobox{A line of type.}}$$
representing `A line of type.' The hboxes for individual lines of type are
eventually joined together by putting them into a ``↑{vbox}'' (↑{vertical
box}). For example, you can say
\ttbegin
\vbox{\hbox{Two lines}\hbox{of type.}}
\ttend
and \TeX\ will convert this into
$$\setbox0=\vbox{\hbox{\demobox{Two lines}}\hbox{\demobox{of type.}}}
\displaybox{$\vcenter{\hbox{\makelightbox\kern-1wd0\box0}}$\qquad
  i.e.,\qquad$\vcenter{\vbox{\hbox{Two lines}\hbox{of type.}}}$}$$
The principal difference between \TeX's method and the old way is that metal
types are generally cast so that each character has the same height and
depth; this makes it easy to line them up by hand. \TeX's types have
variable height and depth, because the computer has no trouble lining
characters up by their baselines, and because the extra information about
height and depth helps in the positioning of accents and mathematical
symbols.

Another important difference between \TeX\ setting and hand setting is, of
course, that \TeX\ will choose line divisions automatically; you don't
have to insert ↑{*hbox} and ↑{*vbox} instructions unless you want to
retain complete control over where each letter goes. On the other hand,
if you do use |\hbox| and |\vbox|, you can make \TeX\ do almost everything
that Ben ↑{Franklin} could do in his printer's shop. You're only giving
up the ability to make the letters come out charmingly crooked or badly
inked; for such effects you need to make a new font. \ (And of course you
lose the tactile and olfactory sensations, and the thrill of
doing everything by yourself. \TeX\ will never completely replace the
good@old@ways.)

A page of text like the one you're reading is itself a box, in \TeX's view:
It is a largish box made from a vertical list of smaller boxes representing
the lines of text. Each line of text, in turn, is a box made from a
horizontal list of boxes representing the individual characters. In more
complicated situations, involving mathematical formulas and/or complex
tables, you can have boxes within boxes within boxes $\ldots$ to any level.
But even these complicated situations arise from horizontal or vertical lists
of boxes pasted together in a simple way; all that you and \TeX\ have to
worry about is one list of boxes at a time. In fact, when you're typing
straight text, you don't have to think about boxes at all, since \TeX\ will
automatically take responsibility for assembling the character boxes into
words and the words into lines and the lines into pages. You only need to be
aware of the box concept when you want to do something out of the ordinary,
e.g., when you want to center a heading.

\danger From the standpoint of \TeX's digestive processes, a manuscript
comes in as a sequence of tokens, and the tokens are to be transformed into
a sequence of boxes. Each token of input is essentially an instruction or
a piece of an instruction; for example, the token `|A|$↓{11}$' normally means,
``put a character box for the letter |A| at the end of the current hbox,
using the current font''; the token `|\vskip|' normally means, ``skip
vertically in the current vbox by the \<dimen> specified in the 
following tokens.''

\danger The height, width, or depth of a box might be negative, in which
case it is a ``shadow box'' that is somewhat hard to draw. \TeX\ doesn't
balk at ↑{negative dimensions}; it just does arithmetic as usual. For example,
the combined width of two adjacent boxes is the sum of their widths, whether
or not the widths are positive.  A font designer can declare a character's
width to be negative, in which case the character acts like a backspace. \
(Languages that read from right to left could be
handled in this way, but only to a limited extent, since \TeX's line-breaking
↑(Hebrew) ↑(Arabic)
algorithm is based on the assumption that words don't have negative widths.)

\danger \TeX\ can raise or lower the reference points of individual boxes
in a horizontal list. Such adjustments take care of mathematical
subscripts and superscripts, as well as the heights of accents and a few
other things. For example, here is a way to make a box that contains
the \TeX\ logo, putting it into \TeX's internal register |\box0|:
\ttbegin
\setbox0=\hbox{T\kern-.1667em\lower.424ex\hbox{E}\kern-.125em X}
\ttend
↑(*setbox)
Here `↑{*kern}|-.1667em|' means to insert blank space of $-.1667$ ems in the
current font, i.e., to back up a bit; and `↑{*lower}|.424ex|' means that
the box |\hbox{E}| is to be lowered by 42.4\%\ of the current x-height, thus
offsetting that box with respect to the others. Instead of
`|\lower.424ex|' one could also say `↑{*raise}|-.424ex|'. Chapter@21
discusses the details of how to construct boxes for special effects;
our goal in the present chapter is merely to get a taste of the
possibilities.

\danger \TeX\ will exhibit the contents of any ↑{box register}, if you
ask it to. For example, if you type `↑{*showbox}|0|' after setting
|\box0| to the \TeX\ logo as above, your ↑{log file} will contain
the following mumbo jumbo:
\ttbegin
\hbox(6.83333+1.82553)x18.61073
.\tenrm T
.\kern-1.66702
.\hbox(6.83333+0.0)x6.80554, shifted 1.82553
..\tenrm E
.\kern-1.25
.\tenrm X
\ttend
↑(diagnostic format) ↑(internal box-and-glue representation) ↑(box displays)
The first line means that |\box0| is an hbox whose height, depth, and width
are respectively $6.83333\pt$, $1.82553\pt$, and $18.61073\pt$.
Subsequent lines beginning with `|.|' indicate that they are {\sl inside\/}
of a box. The first thing in this particular box is the letter@|T| in
font |\tenrm|; then comes a kern. The next item is an hbox that contains
only the letter@|E|; this box has the height, depth, and width of an |E|, and
it has been shifted downward by $1.82553\pt$ (thereby accounting for
the depth of the larger box).

\dangerexercise Why are there two dots in the `|..\tenrm E|' line here?
\answer This |E| is inside a box that's inside a box.

\danger Such displays of box contents will be discussed further in
Chapters 12 and@27.
They are used primarily for diagnostic purposes, when you are trying to figure
out exactly what \TeX\ thinks it's doing. The main reason for bringing them
up in the present chapter is simply to provide a glimpse of how \TeX\ represents
boxes in its guts. A computer program doesn't really move boxes around; it
fiddles with lists of representations of boxes.

\dangerexercise By running \TeX, figure out how it actually handles italic
corrections to characters: how are the corrections represented inside a box?
\answer The idea is to construct a box and to look inside. For example,
\ttbegin
\setbox0=\hbox{\sl g\/} \showbox0
\ttend
reveals that |\/| is implemented by placing a kern after the character.
Further experiment shows that this kern is inserted even when the italic
correction is zero.

\dangerexercise The ``opposite'' of \TeX's logo---namely,
T\kern+.1667em\raise.424ex\hbox{E}\kern+.125em X---is produced by
\ttbegin
\setbox1=\hbox{T\kern+.1667em\raise.424ex\hbox{E}\kern+.125em X}
\ttend
What would |\showbox1| show now? \ (Try to guess, without running the machine.)
\answer The height, depth, and width of the enclosing box should be just large
enough to enclose all of the contents, so the result is:
\ttbegin
\hbox(8.65886+0.0)x24.44478
.\tenrm T
.\kern1.66702
.\hbox(6.83333+0.0)x6.80554, shifted -1.82553
..\tenrm E
.\kern1.25
.\tenrm X
\ttend
(You probably predicted a width of |24.44477|; \TeX's internal calculations are
in |sp|, not |pt|/100000, so the rounding in the fifth decimal place is not
readily predictable.)

\dangerexercise Why do you think the author of \TeX\ didn't make boxes more
symmetrical between horizontal and vertical, by allowing reference points
to be inside the boundary instead of insisting that the reference point
must appear at the left edge of each box?
\answer No applications of such symmetrical boxes to English-language
printing were apparent; it seemed pointless to carry extra generality
as useless baggage that would rarely if ever be used, merely for the sake of
symmetry. In other words, the author wore a computer science cap instead
of a mathematician's mantle, on the day that \TeX's boxes were born.
Time will tell whether or not this was a fundamental error!

\ddangerexercise Construct a |\demobox| macro for use in writing manuals
like this, so that an author can write `|\demobox{Tough exercise.}|'
in order to typeset `\thinspace\demobox{Tough exercise.}\thinspace'.
\answer The following solution is based on a general |\makeblankbox|
macro that prints the edges of a box using rules of given thickness
outside and inside that box; the box dimensions are those of |\box0|.
It is assumed that the macros of Appendix@E are already present.\par
|\def\dolist#1{\def\next{#1}|\par
|  \ifx\next\endlist \let\next\relax|\par
|  \else \\ \let\next\dolist \fi|\par
|  \next}|\par
|\def\hidehrule#1#2{\kern-#1\hrule height#1 depth#2 \kern-#2 }|\par
|\def\hidevrule#1#2{\kern-#1{\setdimen0=#1|\par
|    \advdimen0 by#2\vrule width1dm0}\kern-#2 }|\par
|\def\makeblankbox#1#2{\hbox{\lower1dp0\vbox{\hidehrule{#1}{#2}%|\par
|    \kern-#1 % overlap the rules at the corners|\par
|    \hbox to 1wd0{\hidevrule{#1}{#2}%|\par
|      \raise1ht0\vbox to #1{}% set the vrule height|\par
|      \lower1dp0\vtop to #1{}% set the vrule depth|\par
|      \hfil\hidevrule{#2}{#1}}%|\par
|    \kern-#1\hidehrule{#2}{#1}}}}|\par
|\def\maketypebox{\makeblankbox{0pt}{1pt}}|\par
|\def\makelightbox{\makeblankbox{.2pt}{.2pt}}|\par
|\def\\{\ifx\next\space\ |\par
|  \else \setbox0=\hbox{\next}\maketypebox\fi}|\par
|\def\demobox#1{\setbox0=\hbox{\dolist#1\endlist}|\par
|  \copy0\kern-1wd0\makelightbox}|\par

\chapterend

I have several boxes in my memory
in which I will keep them all very safe,
% he's talking about "instructions"
there shall not a one of them be lost.
\author IZAAK ↑{WALTON}, {\sl The Compleat Angler\/} (1653) % beginning Chap12
% in 1654 and subsequent editions, this quote comes in Chap17
% the 1653 spelling agrees with 20th century conventions in this passage!

\bigskip

The only thing that never looks right is a rule.
There is not in existence a page with a rule on it
that cannot be instantly and obviously improved
by taking the rule out.
% "Even dashes, cherished as they are by authors who cannot punctuate,
% spoil a page. They are generally merely ignorant substitutes for colons."
\author GEORGE BERNARD ↑{SHAW}, in {\sl The Dolphin\/} (1940) % v4 p81

\eject